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ABSTRACT 

The primary purpose of the present study was to 
investigate the appropriateness of several tests of significance for 
use with interrupted time series data. The second purpose was to 
determine what Effect the violation of the assumption of uncorrelated 
error would have on the three tests of significance. The three tests 
were the Mood test. Walker- Lev Test 3, and Double Extrapolation 
Technique. The procedure was basically that of generating a large 
number of time series having specified characteristics and performing 
the tests of significance on each generated time series. The results 
of the study indicated that the three tests of significance are 
appropriate for use on data of interrupted time series form. Tables 
and figures illustrate the text. (Author/DB) 
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A Study of the Effect of Proximally Autocorrelated Error 
on Tests of Significance for the Interrupted 
Time-Series Quasi- Experiments I Design 

Joyce Sween and Donald T. Campbell 
Northwestern University 

The time series experiment has long been a common research design In 
the biological and physical sciences. However, with psychological and soci- 
ological data certain problems of analysis and Interpretation occur when the 
interrupted time series design Is employed (Campbell, 1963; Campbell & 
Stanley, 1962; Holtaaan, 1963). 

One of the problems which la of particular concern to the social 
scientist (for whom the magnitude and clarity of effects is not always as 
clear-cut as in the biological and physical sciences) Is testing for the 
significance of the change, X. It is desired to have some statistical test 
of significance that would distinguish the effect of an Intervening event or 
experimental variable from purely random fluctuation. Although there are no 
teats of significance that are completely suitable to the time series, situa- 
tion, several possibilities for such a test do exist and merit consideration 
(Campbell, 1963; Campbell & Stanley, 1962). 

Another problem of fundamental concern to the social scientist Is the 
possibility of sequential dependency in successive observations of the time 
series. Measurements made at time points which are closer together may be 
more strongly related than measurements made farther apart in time. Statis- 
tical models on which the tests of significance are based generally assume 
that the observed values exhibit Independent error. Thus, the significant 



autocorrelation which exists when a given observation la dependent upon th? 
observations pzdcedtag it violates the assumption of Independence in these 
teats. Although thia circumstance does not prevent estimation (by least 
square methods) of the regression Coefficients which the various tests of 
significance use, tha classical formulas for the mean errors of the estimated 
coefficients are In actuality no longer valid. When successive observations 
through time are correlated, perhaps, to a lag of several points, use of 
existing tests of significance may be Inappropriate. 

The primary purpose of the present study was to investigate the appro- 
priateness of several tests of significance for use with Interrupted time 
series data. The three main tests were those described by Mood (1950, pp. 
297-298; pp. 350-358) and Walker and Lev (1953, pp. 390-395; pp. 399-400). 

The interrupted time series experiment as considered in the present 
study consists of a finite sequence of observations which are real-valued 
measurements of some individual or group obtained at n successive equally 
spaced points in time. An experimental change is then introduced or a major 
change of conditions occurs at some point in the time series. In the pres- 
ent study, the experimental change or treatment is assumed to occur midway 
between two consecutive measurements. It can be diagrammed as follows: 



experimental 

change 

The y^ are the measurements recorded as time tj. The values y^(i ■ 1,2, ...m) 



Interrupted Time Series Design 



Observation at tisu. t^: 
Time: 



yi y2 ••• y«n ym+1 y«nf2 




• • • 



obtained prior to the treatment are referred to as pre- change values. The 



values y t (1 ■ m •+• 1, m -!- 2,.*.n) obtained after the treatment are post- 
change values. If the treatment haa produced an effect, a discontinuity of 
measurements la recorded in the time series. The statistical teats of sig- 
nificance which were investigated in the present study as possible techniques 
for distinguishing such a treatment effect from purely random fluctuation 
are described in detail below: 

(a) Mood Test . This Is a t teat (Mood, 1950, pp. 297-298) for the sig- 
nificance of the first post-change observation from a value predicted by a 
linear fit of the pre-change observations. The formula for .t la 






A \j w V /C rA - *) 6 A 

The estimate, y o , of the first post-change value is obtained from the pre- 
change regression ea time tea m m v 



^3 



£ * A - 

4 t- k # / 



A. = y- 

where m * number pre-change points 

£ ■ 1,2,... m 

X " £ X:/m 
.in 

t -i. -ti/n 

The difference between the estimated value and the obtained first post-change 



value, y s , la used in the t test. Since *\ and an 

estimate, 6* , of 6 X is given by ivT , the computation 

formula for the .t test of the difference between the obtained and predicted 
post-change point la given by 



* 
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The denominator is the standard error of the difference for ,t. The df for 
Jt is (m-2) . The significance of the obtained J: la evaluated by reference to 
a standard t table. 

(b) Walker -Lev Teats . Walker and Lev (1953, pp. 390-395) provide the 
following testa of significance. These tests are useful in determining 
whether differences exist between the regression equations for pre- and post- 
change groups. 

Test One . 

This Is a test of the hypothesis of common slope. The F ratio is 
given by 

^ A. X 

where Sj - sum of squares of the common within groups regression 
estimates from the separate group regression estimates 

% ] 

Sg ■ sum of squares of the obtained occasion values from the 
separate group regression estimates 

N a total number of occasions in time series 
1=1,2 [l=prechange group; 2=postchange group] 

j - 1, 2,...N t 

Formulas for the common within groups slope and Intercept are 
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Formulas for the separate group slopes and intercepts are 

«.• -- y. - hi £ 

The computational formulae for the numerator and denominator sum of squares 



for the F ratio are 
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The renulting F ratio is evaluated with 1 and N-4 degrees of freedom. 

Teat Two . 

This is a test of the hypothesis that the slopes of the pre- and post- 

A A 

change groups are equal to zero when It has been established that 

(that Is, F of Test One Is not significant). The variance ratio Is given by 

>v-y 



F- 



c a ry*> 



CTT~-cyy~-c*rv„ * 

and is evalueted with 1 and N-4 degrees of freedom. 

Test Three . 

This is a test of the hypothesis that a single regression line fits 
both the pre- and post-change groups. The F ratio Is given by 
C N- 3 
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where S fc ■ sum of squares of common within groups regression estimates from 
the total series regression estimates 

^ i/ ) ~ r * Vj 

S w " 8uin °* 8< l uar « 8 of obtained occasion values from the common within 



groups regression estimates 
Sk 



Formulas for the total series slope and intercept are 

Sr- ‘ T * 



CTT r 

*t = y -S r t 



The computational formulae for the numerator and denominator sum of squares 
for the F ratio are 
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The resulting F ratio is evaluated vlth 1 and N-3 degrees of freedom. 

(c) "Double Extrapolation 11 Technique . This test is concerned with the 
significance of the difference between two separate regression estimates of 
the y-value for time t Q which lies midway between the last pre-change point 
and the first post-change point. Reference to the Interrupted Time Series 
Design dlagrananed on page 2 will indicate that this point Ilea midway 
between t m and t^j. Assuming that the points are equally spaced in time, 
t 0 is equal to t^. 

The first regression estimate for y Q is obtained from the pre-change 



values; the second regression estimate for y 0 , from the post-change values. 

>n estimates are 

S.-_ ^)(g %;)/*/; 



The formulas for the two regression estimates are 

A & 



%* ij * Y£>v)7m 



' Y‘j - 3 t i ij 

where i • 1,2 [1 » prechange group; 2 • postchange group] 

N. ■ number occasions in group 1 

3 - 1,2,. . .N. 

a A £ 

Thus, the two estimates for y ere given by y* c * *■ otj+D, T t . The difference 

| Y 0 *» ^ I 

between these two estimates can be evaluated by the t-ratio t ■ — V 
and its significance determined by reference to a t table with Nj 1^ * 4 
degrees of freedom. The standard error of the difference for £ depends upon 
the variability of the pre- and post-change values, Nj and Nj, the relation 
between t and y, and the distance of t Q from t} and t 2 * The formula for S, 
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The second purpose of the present study w*s to determine what effect 
the violation of the assumption of uncorrel^ted error would have on the 
three tests of significance. Do the tests become Inappropriate when there 
Is positive sutocorrelstion between points? 

The scientist usually does not have knowledge of the degree of sequen- 
tial dependency which exists In a given time series. The first serial cor- 
relation coefficient (of lag one) is used to determine whether the observa- 
tions of a given time series can be regarded as consisting of independent 
error only (measures not correlated). The serial correlation coefficient 
tests the hypothesis that the order of dependence in the time series Is zero 
against the alternative that It is one. Serial correlation coefficients of 
lags higher than one can also be considered. (If the serial correlation is 
that of a variable with itself, it may be referred to as an autocorrelation 
coefficient.) 

A serial correlation coefficient of lag one, r^, Is obtained by pair- 
ing observations one time unit apart; that is, the first observation Is 
paired with the second, the second with the third, etc., throughout the 
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entire time series until the last observation is reached. The product-moment 
correlation la computed using the resulting pairs of observations. In a 
similar manner, the second serial correlation coefficient, rj» is obtained 
by pairing successive observations two time units apart and r Is obtained 

B 

by pairing observations u units apart. The basic formula for is given 



The model of autocorrelation used In the present study Is basically 
one of proximity, namely. It was assumed that measurements made closer 
together in time would be more strongly related than measurements made fur- 
ther apart In time. This means that in Instances where a significant auto- 
correlation r a exists, increasingly larger autocorrelations should exist for 
r a-l» r a-2» •••• an< * respectively. 

Since the slope of the line contributes to serial dependency between 
points, interest in the present study is In the autocorrelations of depart- 
ures from file line of best fit for the total series. However, because true 
effects increase proximal autocorrelation, differences from the separate 
pre- and post-change regression lines give better estimates of existing auto- 
correlations in instances of true effects. So as not to penalise true 
effects in the data version of the program (a computer program which performs 
the tests of significance on input data of interrupted time aeries form is 



below: 



U 




where N ■ total number of occasions in the time series 
n ■ N-a 



i ™ 1,2,. . .n 
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presented in Sveen and Campbell, 1965), autocorrelation coefficients based 
on departures from the separate regression lines are also computed. Althobgh 
only departures from the line of best fit for the total series vere used in 
the present study, in further work autocorrelations baaed on differences 
from separately fitted regression lines will also be utilized to Increase 
comparability of the Monte Carlo results with actual experimental data. 

METHOD 

The procedure was basically that of generating a large number of time 
series having specified characteristics and performing the tetta of signifi- 
cance on each generated time aeries. In this way, distributions of .t's 
and/or |'s for the three teats of significance were obtained. The distribu- 
tions were then examined to determine how satisfactory each of the three 
tests of significance vere In terms of the risk ( ) of rejecting the null 

hypothesis (the experimental change X had no effect) when It was true. 

The time series vere constructed so that the hypothesis of no effect 
vas true. Normal random error was added to each "true" point to produce a 
time series of N observed values such that yi( o58erved ) ■ y*? t rue) + * rror i 
for i ■ 1,..,N. Two general types of error, Independent error and/or cor- 
related error, could be edded go the "true" line values. When the null 
hypothesis la true and the assumptions for use of the testa of significance 
are met, the theoretical values from the £ and F tables should be exceeded 
by chance only 1Z and 5Z of the time. Thus, when only Independent error is 
added, the discrepancies between the theoretical values and the obtained 
percent of significant t's and F's should Indicate how suitable the test of 
significance la in the interrupted time series situation. In addition, the 
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obtained percentage of t's and F's which exceed the theoretical values when 
sequential dependency between points is built in should indicate how the 
violation of the assumption of Independence of error may further restrict 
the usefulness of the tests. 

Each generated time series could be varied with respect to (a) the 
number of pre- and post-change occasion values, (b) the slope of the "true" 
line, (c) the degree of autocorrelation between points, and (d) the total 
error variance about the true line values. In the present study the follow- 
ing combinations of pre- and post-change occasion values, "true" line slopes^ 
autocorrelated error, and error variance were used: 

(a) Pre- and post- change sample sizes of 10, 20, and 300 were used to 
yield time series with the following number of total occasion values: 

Total Occasions, N ■ 20 (pre ■ 10; post ■ ?0) 

Total Occasions, N ■ 40 (pre « 20; post ■ 20) 

Total Occasions, N *2Ki’’ (pre »1C0; post «100) 

In the presentation of the results these three conditions of sampl- 
ing are referred to in terms of the total occasions In the series as 
N ■ 20, N ■ 40, and N ■ 200. The degrees of freedom and critical 
values for the Mood test of significance are, however, based only 
upon the pre-change points of 10, 20, and 100. For the Walker-Lev 
' Test 3 and Double Extrapolation Technique the total series points of 
20, 40, and 200 are used. 

(b) The slope of the true line was specified as 0*.0, 1.0, and 20.0. 

(c) The degree of autocorrelated error was apeclfied as zero (Independ- 
ent error only), one, two, and three (correlated error for measure- 
ments one, two, and three time lags apart). 

(d) Total error variance about the true line was specified at 1.00 and 
5.00 and normal random error was drawn from Gaussian distributions 
of zero mean and appropriate standard deviations to yield equal 
error variance about the true line for all degrees of autocorrelated 
error. 

For the total error variance specified as 1.00, the standard devia- 
tions of the normal distributions from which the normal random 
errors were drawn would be 
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1.00 (unique error only) 

.58 (unique plus error of lag 1) 

.50 (unique plus error of lag 2? 

.45 (unique plus error of lag 3) 

For a total error variance specified ss 5.00, the standard deviations 
of the noroel distributions from vhlch the normal random errors were 
drawn would be 

2.24 (unique error only) 

1.29 (unique plus error of lag 1) 

1.12 (unique plus error of lag 2) 

1.00 (unique plus error of lag 3) 

One thousand sets of time series were generated for each type of auto- 

correlated error in various combinations with the options of sample size, 
true line slope, and total error variance listed in a, b, and d above. The 
following tests of significance were performed on each of the 1000 sets of 
generated data: Mood test, ttalker-Lev Test 3, Double Extrapolation tech- 

nique. .(The Walker-Lev Teats 1 and 2 and a test proposed by Clayton and 
described by Campbell (1963, p. 225) were performed on a smaller sample of 
100 sets of generated time series.)* 

For each test of significance, the percent of £*8 or F's which exceeded 
the theoretical 17, and 57* values was determined. The complete sequence of 
operations was performed internally on the IBM 709 computer prxmeiu program- 
med to perform the necessary operations. The basic steps In the computation 
procedures are summarized below: 

(1) The "true" line pre- and post-X occasion values were determined on 
the basis of desired slope. Normal random errors yielding specified 
autocorrelation were added to the "true" line points to form the set 
of time series data . (A binary subprogram from the Vogelbsck com- 
puting center (NU-0044) was used for the generation of normal random 
numbers. The mean of the Gaussian distribution approximates zero; 
the standard deviation was specified as described In (d) above). 

The program generated 1000 sets of time series for each type of 
error fluctuation specified. 

(2) F and t values for the Mood, Walker-Lev 3, and Double Extrapolation 
tests of significance and serial correlation coefficients r(l) , r(2) , 
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r(3) , end r(4) were computed for each of the 1000 sets of generated 
time series data. A data plot of the time aeries could also be 
obtained for each set. 

for the Mood, Walker-Lev 3, and Double Extrap 
the basis of magnitude. The ordered t and F 
values and/or the percents of .fc's and F's above the tabled critical 
values were printed out. The correlations between the t and F 
values and the autocorrelation coefficients were determined. 

RESULTS 

The results of the present study indicated that the three main tests 
of significance (Mood test, Walker-Lev Test 3 , and Double Extrapolation Tech- 
nique) sre appropriate for use on data of interrupted time series form. How- 
ever, the results also indicated that when statistical dependency between 
measurements exists use of these tests with the tabled critical values will 
yield significant results by chance alone more than the expected one percent 
and five percent of the time. Thia was particularly true for the Walker-Lev 
Teat 3 and the Double Extrapolation Technique. , 

These results are summarized in Table 1 where the percent of F's and 
t's above the tabled 1% and 57, critical values are given for the three sta- 
tistical tests of significance. These percent values were obtained from the 

Insert Table 1 about here 

number of instances of significance in 1000 sets of generated Monte Carlo 
time series. Four degrees of dependency between points and total numbers of 
time series occasion values of 20, 40, and 200 are represented in Table 1 
(for a total N of 200, only the independent error and the lag three corre- 
lated error Monte Carlo generations were available). As indicated in Table 
1, when no significant autocorrelation exists between points, that is, the 
errors are independent, the alpha values ate *|lp«wwlaa.fcol> the theoretical 



(3) The 1000 t «aJ.'F values 
olation/wfcre sorted on 
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Supplementary Note: 



Since the research reported herein was done, a highly relevant 
test of significance has been reported: 

G. E. P. Box and George C. Tlao, "A change In level of a non- stationary 
time series," Blometrlka . 1965, 52, 181-192. 

In particular, the Box and Tlao approach avoids the assumption of linearity, 
In exchange for other, probably more reasonable, assumptions. What is 
needed Is a computer program for the exact distribution computation by 
numerical methods for their test when 6 and are unknown (pp. 189-191), 

plus a testing of the formula against null models such as those used here, 
plus a testing of our linear formulas against the null data generated 
according to the Box and Tlao assumptions. 
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★Each percent based on number of significant instances in 1000 sets of generated time series of true 
line slope ■ 1.00 and total error variance * 5.00. 
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values expected. When the assumption of independence is not met (correlated 
error), the percent of significant time series exceeds the expected one per- 
cent and five percent values. The percent of significant instances increases 
with increasing autocorrelation between points, particularly for the Walker- 
Lev Teat 3 and Double Extrapolation Technique. 

The slope of the true line and the total error variance about the true 
line had no effect on the percent of F's and £*8 above the tabled critical 
values. The percents of lolse-positives were similar for true line slopes 
of 0, 1, and 20 and for total error variances of 1.00 and 5.00. These 
results were obtained for all three testa of significance. 

Although the true line slope and total ert.or variance had no effect on 
the number of falae-positivea, the total number of occasions in the time 
aeries did produce an effect. Aa can be observed in Table 1, the percent of 
significant instances tends to vary with the total N of the aeries. This is 
particularly evident for the Walker-Lev Test 3 and Double Extrapolation Tech- 
nique performed on time aeries with correlated error for three lags. The 
greater error in regression estimates with smaller N's is moat likely the 
crucial factor in this effect. 

The effect of the total number of occasion values in the time series 
was also evident in departures of the obtained average autocorrelation coef- 
ficients from their expected values. The expected values a of r a is given 



f-szr/\N s -L where N g refers to the sample aize of 1000 in the present 
study. When expected values were compared with the obtained averages (over 
1000 seta of generated time aeries) the obtained averages were lesa than the 
expected values for the smaller N'a of 20 and 40. The obtained averages 
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approached the expected values for N of 200. Table 2 Indicates this rela- 
tionship between expected value and obtained averages for the first autocor- 
relation coefficient r^. Similar results were obtained for the autocorrela- 
tion coefficients r 2 , r^, r^. 

Insert Table 2 about here 

On the basis of the Monte Carlo findings of elevated alpha values for 
time series exhibiting proximally correlated error, we present In Table 3 
new critical values for the three tests of significance. These new critical 
values approximate more closely the expected one and five percent signifi- 
cance levels. The new critical values were obtained by the method of basic 
linear Interpolation from the IBM printout giving the "Percent of £*s and 
t's In Intervals of specific widths" for each test of significance. The 
values are based on Monte Carlo generations of 1000 sets of time series for 
each type of error fluctuation. As shown In Table 3, when time series 

Insert Table 3 about here 

observations are not correlated (assumption of Independence met) the criti- 
cal values which yield significant results one and five percent of the time 
by chance alone are, In general, similar to the tabled critical values for 
all three statistical tests of significance. However, when a significant 
autocorrelation exists between points, that Is, when a given observation Is 
dependent upon preceding observations, the critical values yielding signifi- 
cant results one and five percent of the time are similar to the tabled 
values for the Mood test only. For both the Walker-Lev Test 3 and Double 
Extrapolation Technique, the critical values Increase with Increasing auto- 
correlation. 

The new critical values (Table 3) are plotted In Figures 1-6 as a 
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Table 2 



Means and Standard Deviations of the First 



Autocorrelation Coefficient rj 





Obtained Values 


Expected Values 


Error Type 




N - 20 


N • 40 


N - 200 




Independent 


X 


-.11 


-.05 


-.01 


.00 




SD 


.21 


.16 


.07 


.03 


Correlated 












Lag 1 


X 


.18 


.26 


m 0 


.33 




SD 


.20 


.13 




.03 


Correlated 












Lag 2 


X 


.31 


.41 


-- 


•50 




SD 


.22 


.14 


-- 


.02 


Correlated 












Lag 3 


X 


.37 


.49 


.58 


.60 




SD 


.24 


.14 
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.06 

1 


.02 
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Critical Values Yielding One and Five Percent Significance Levels 



*9 



function of the obtained autocorrelation coefficient. As dlacuaaed above, 
the obtained average r^ is leaa than the expected value due to greater error 
in regression estimates with small N. Figures 1*6 may be used to obtain 
approximate one percent and five percent critical values when correlated 
error exists between points. The new critical values are found by interpo- 
lating on the abscissa using the obtained first autocorrelation coefficient 
and on the face of the graph using N. For the Mood test N refers to the 
number of pre-change points; for the Walker-Lev Test 3 and Double Extrapola- 
tion Technique N refers to the total number of occasions in the time series. 
The new critical value is read from the ordinate. 

Insert Figures 1-6 about here 



Some Utilizations 

The tests of significance which we are offering are being applied in 
situations in which the visual lmpreaslon is the usual basis of interpreta- 
tion. In some sense, the tests are designed to imitate such Judgments, mak- 
ing more precise and rationalizing the criteria involved. Thus, the more 
variable tha pre-change points, the larger the crosa- treatment change must 
be to "appear" significant. And holding this constant at a small variabil- 
ity, the longer the sampling of pre- and post-change observations, the more 
confident we are that a given trans- treatment shift is truly exceptional, is 
more than a coincidence. These ingredients feature prominently in the tests 
we have examined. 

In the present study, we have examined only linear hypotheses. Many 
times a significant effect in terms of linear hypotheses will appear upon 
graphic inspection to be a homogenous curvilinear process with no discontinuity 
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at the treatment point. Usually one will not have sufficient degrees of 
freedom to test curvilinear hypotbeaes. For these and other reasons, it 
seems best to accompany tests of significance with graphic presentations. 

In Figures 7, 8, 9, and 10 ve present "significant" time series gen- 
erated by the Monte Carlo of null conditions, one each independent error 
and correlated error of one, two, and three lags. These graphs give some 
indication of the effect of proximal autocorrelations. In all four figures, 
the true line slope is 1.0, the tot&l error variance shout the true line is 
5.00, and the total number of occasions la 40 (20 pre-change and 20 post- 
change). The specific data on the teats of significance is as follows: 

Insert Figures 7. 8, 9. and 10 about here 
For Figure 7 (independent error) the £ value for the Mood Test was 2.55, the 
F value for the Walker-Lev Test 3 was 6.11, and the £ value for the Double 
Extrapolation Technique was 2.45. The values of the autocorrelation coef- 
ficients based on departures from the line of beat fit for the total series 
were rj « .07, rj ■ -.24, rj “ -.08, rg ■ - .07. In Figure 8 (correlated 
error of lag 1) the £ value for the Mood test was 2.34, the F value for the 
Walker-Lev Test 3 was 6.13, and the £ value for the Double Extrapolation 
Technique was 2.45. The values of the autocorrelation coefficients were 
rj ■ .45, ■ -.22, ffj ■ -.44, r^ • -.26. In Figure 9 (correlated error of 

lag >2) the £ value for the Mood teat was 3.86, the F value for the Walker- 
Lev Teat 3 was 4.66, and the £ value for the Double Extrapolation Technique 
was 2.16. The values of the autocorrelation coefficients were r ■ .51, 
r ■ .25, r • -.06, r ■ -.15. In Figure 10(correlated error of lag 3) the 
£ value for the Mood test waa 3«54, the F value for the Walker-Lev Test 3 
was 7.45, and the £ value for the Double Extrapolation Technique was 2.88. 
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»*. 7. A significant Manta Carlo ganarated tiaa seriaa for lndapandent error 
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Fig. 8. A significant J-bnte Carlo generated tin.e series for correlated error of one lag 
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A algnifioant Vonta Carlo generated time aeries for correlated error of tvo lags 
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The values of the autocorrelation coefficients were r ^ ■ .76, ^ * >50, 
r 3 ■ .20, - *.14. 

To further illustrate utilizations of these tests of significance, 
in Figure 11, 12, 13, and 14 we present actual data. In general, these 
figures represent instances in which the tests of significance confirm Judg- 
ments of effect which had been made on the basis of visual inspection alone. 

Insert Figures 11, 12, 13. and 14 about here 

Figures 11 and 12 represent Chicago crime rate statistics for the 
categories of "Larceny under $50" and "Murder and non-negligent manslaughter," 
respectively (from Uniform Crime Reports for the U.S. , 1942-1962). On Febru- 
ary 22, 1960, it was announced that Orlando Wilson htd been selected as 
Superintendent of the Chicago Police Force (New York Times, February 23, 

1960, page 1), He becawa acting commissioner on March 2, 1960. It was sub- 
sequently reported that "Recorded crime in Chicago increased 83.7% in the 
first 10 months of 1960, police statistics indicated today. The officials 
refused to say how much of the increase was due to a higher crime rate and 
how much to improved record keeping by police" (New York Times, December 17, 
1960, page 34). In Figure 11, all statistical tests indicate that the 
observed increase in recorded "Larceny under $50" was significant. The jt 
value for the Mood teat was 12.54, the F value for the Walker-Lev Test 1 was 
51.59, the F value for the Walker-Lev Test 3 was 157.76, and the .t value for 
the Double Extrapolation Technique was 9.12. The value of the first auto- 
correlation coefficient based on departures from separate pre- and post- 
change regression lines was .43. However, in Figure 12 where the effect 
seems to start several observations before Orlando Wilson's appointment, 
none of the statistical tests' were significant. The t value for the Mood 
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number of reported offenses 




YEAR 



Fig* 11* Number of reported leroeniee under 9$0* ! Chie«f»» Illinoie. 1942-1962 (from 
Uniform Crime Report » for the United Ste tee * 1942-62)* 
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Number of reported nirdere end non-ncgligent ■aamleughter*. 
1942-1962 (from U niform Crlne Rtporta for th» United State* . 



Chicago 

1942-62 



Illinois. 
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Fig. 13. Number of hospitalised mental patients in the United States before and after 
the advent of tranqulllxlng drugs (from Brltannloa Book of the Year . 1965). 
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Fig, 14. Reliability coefficient* for the structure dimension of tba Laadarahip Opinion 
Questionnaire for training aaaalona before and aftar tha aaaaaination of 
President Kennedy (from Ayers, 19&4)« 
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test vas 1.10, the F valuesfbr the Walker-Lev testa 1 and 3 vere .03 and 
2.69, respectively, and the t value for the Double Extrapolation Technique 
was .82. The value the first autocorrelation coefficient based on 
departures from the total series regression line was .15. 

Figure IB represents the number of hospitalized mental patients In the 
United States before and after the advent of the use of tranqullizlng drugs 
(from Britannlca Book of the Year. 1965) . The significant statistical tests 
were the Walker-Lev Test 2 (F • 213.75) and the Mood test (t ■ 7.40). The F 
value for the Walker-Lev test 3 vas .26 and the t value for the Double Extra- 
polation technique vaa 1,38, The value of the first autocorrelation coeffi- 
cient based on differences from the separate pre- and post-change regression 
lines was .52. 

Figure 14 represents reliability coefficients for the Structure Dimen- 
sion of a Leadership Opinion Questionnaire which was administered at the 
beginning and end of each of eight week-long training sessions. The train- 
ing session groups ranged in sire from 22 to 55 participants. Classes for 
the veek-long sessions were given in the Spring and Autumn months of !%3 
and are Indicated in chronological order on the abscissa of the graph. For 
the eighth training session, pretesting took place on November 18, 1963, 
whereas posttesting occurred on Friday afternoon, November 22, at the close 
of the session. During the lunch period, 1:30-2:30 P.M, , the partlcipsnts 
had watched and listened to the memorable events being reported from Dallas, 
Texas (from Ayers, 1964). Since only one post-change point is given, only 
the Mood test of significance is applicable . The obtained Jt value for Mjou 
test was 6.24 with 5 df, thus, confirming the impression of effect. The 
value of the first autocorrelation coefficient based on differences from the 
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separate pre- and post-change regression lines was .12. 

As the above examples of actual tine aeries illustrate, the teats of 
significance are not equally applicable to all time series data. The several 
possible time aeries of Figure 15 Indicate this. The time aeries of A1 and 

Insert HaAt'e 15 about here 

A2 are instances of sustained post-change effect involving a change in inter- 
cept; trtiereas, B1 and B2 show an initial jump and aubsequent return to pre- 
change conditions. If theoretical expectations are appropriate to using a 
series of points in the post-change period (as they would be in A1 and A2) 
the Walker-Lev Test 3 and Double Extrapolation Technique would be most suit- 
able. These two tests would not be used in instances of theoretical expects* 
tions similar to B1 and B2 where only the Mood Test of the significance of 
the first-post change observation from a trend extrapolated from the pre- 
change observations would be appropriate. 

A times series analysis seems most suitable in instances discussed 
above. In Cl, C2, and C3 where the post- treatment effect involves a change 
in slope and, generally, of intercept, the possibility of a curvilinear rela- 
tionship or cyclical trend is more difficult to rule out. A large number of 
pre- and post- treatment observations would be helpful in eliminating these 
possibilities. All three statistical tests and, in particular, the Walker- 
Lev Test 1, may be used, but they may yield conflicting results. For example, 
the double extrapolation technique can be used to indicate if the pre- and 
post-regression lines coincide at time t Q midway between the last pre-change 
and flrat post-change point. However, extension of the regression lines of 
C2 will show that the two lines coincide at time t Q , although pre- and post- 
regression lines are dissimilar. 
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Pig. 15. Son* possible outcome patterns for the interrupted ti— series 
experiment. 



Similarly, the first post* change point may not differ greatly from 
that predicted by the pre-change values, yet a continuing and substantial 
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increase or decrease in subsequent post-change values may suggest an effect. 

In instances of delayed effect, such as Dl, ambiguity is Introduced 
into the interpretation of significance (aa the time interval between the 
treatment and its effect increases the plausibility of rival hypotheses also 
incresses) . However, if the experimenter specifies in advance the exact 
relationship between the introduction of the treatment and the manifestation 
of its effect, the pattern Indicated by time-series Dl could be almost as 
definitive as that in which immediate effect is expected. 
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Footnotes 

*A preliminary investigation of the Clayton Teat using 100 sets of gen- 
erated time aeries (total N ■ 40, true line slope - 1.00, total error vari- 
ance ■ 5.00) yielded alpha values greater than the expected one and five per- 
cents When no dependency between points was built in (independent error). 

The percents of aignificant instances exceeding the five percent tabled crit- 
ical value of F were 17%, 36%, 53%, and 61% for independent error, lag 1, 
lag 2, and lag 3 correlated errors, respectively. In lieu of this prelimin- 
ary finding, the Clayton teat was not considered a useful significance test 
in the interrupted time series situation and it was eliminated from further 
Monte Carlo investigation. 

In a similar preliminary investigation, the Walker-Lev Test 1 yielded 
4% (independent error), 16% (lag 1 error), 26% (lag 2 error), and 34% (lag 3 
error) falae-poaitive instances obove the five percent tabled critical value 
of F. Although the Talker-Lev Teat 1 yielded results similar to those 
obtained in a preliminary investigation of the Walker-Lev Test 3, it was 
eliminated in the final 1000 set investigation. Preference was given to the 
Walker-Lev Test 3 because of the more restricted usefulness of Teat 1. Since 
the Walker-Lev Test 1 is a test of slope differences only it is more diffi- 
cult to rule out rival curvilinear hypothesis in cases of significance. 

The Walker-Lev Teat 2 was not used because a teat of the null hypothe- 
sis of zero slope is not of general interest as a test of significance in 
the time series situation. 
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